Using Commonsense Knowledge to Automatically Create (Noisy) Training Examples from Text

نویسندگان

  • Sriraam Natarajan
  • Jose Picado
  • Tushar Khot
  • Kristian Kersting
  • Christopher Ré
  • Jude W. Shavlik
چکیده

One of the challenges to information extraction is the requirement of human annotated examples. Current successful approaches alleviate this problem by employing some form of distant supervision i.e., look into knowledge bases such as Freebase as a source of supervision to create more examples. While this is perfectly reasonable, most distant supervision methods rely on a hand coded background knowledge that explicitly looks for patterns in text. In this work, we take a different approach – we create weakly supervised examples for relations by using commonsense knowledge. The key innovation is that this commonsense knowledge is completely independent of the natural language text. This helps when learning the full model for information extraction as against simply learning the parameters of a known CRF or MLN. We demonstrate on two domains that this form of weak supervision yields superior results when learning structure compared to simply using the gold standard labels.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Scripts to help in Biomedical Text Interpretation

Introduction This short note speculates on the use of world knowledge to help interpret a short paragraph of biomedical text about transcytosis (transport across a cell). The ultimate goal is to create a simple representation of the transcytosis process from the text (either automatically or semi-automatically). The challenges are formidable, as the process involves several steps that are often...

متن کامل

Online Inference-Rule Learning from Natural-Language Extractions

In this paper, we consider the problem of learning commonsense knowledge in the form of first-order rules from incomplete and noisy natural-language extractions produced by an off-the-shelf information extraction (IE) system. Much of the information conveyed in text must be inferred from what is explicitly stated since easily inferable facts are rarely mentioned. The proposed rule learner accou...

متن کامل

Extracting Glosses to Disambiguate Word Senses

Like most natural language disambiguation tasks, word sense disambiguation (WSD) requires world knowledge for accurate predictions. Several proxies for this knowledge have been investigated, including labeled corpora, user-contributed knowledge, and machine readable dictionaries, but each of these proxies requires significant manual effort to create, and they do not cover all of the ambiguous t...

متن کامل

Commonsense for Making Sense of Data

In my doctoral research, I address the problem of automatically acquiring commonsense knowledge from text corpora and also from data-sets containing visuals (images, videos) along with textual descriptions. I also aim to exploit the acquired commonsense knowledge for domain-specific and domain-independent applications such as fine-grained search, retrieval and prediction, data integration and a...

متن کامل

Commonsense from the Web: Relation Properties

When general purpose software agents fail, it’s often because they’re brittle and need more background commonsense knowledge. In this paper we present relation properties as a valuable type of commonsense knowledge that can be automatically inferred at scale by reading the Web. People base many commonsense inferences on their knowledge of relation properties such as functionality, transitivity,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013